301 research outputs found
Efficient Bit-parallel Multiplication with Subquadratic Space Complexity in Binary Extension Field
Bit-parallel multiplication in GF(2^n) with subquadratic space complexity has been explored in recent years due to its lower area cost compared with traditional parallel multiplications. Based on \u27divide and conquer\u27 technique, several algorithms have been proposed to build subquadratic space complexity multipliers. Among them, Karatsuba algorithm and its generalizations are most often used to construct multiplication architectures with significantly improved efficiency. However, recursively using one type of Karatsuba formula may not result in an optimal structure for many finite fields. It has been shown that improvements on multiplier complexity can be achieved by using a combination of several methods. After completion of a detailed study of existing subquadratic multipliers, this thesis has proposed a new algorithm to find the best combination of selected methods through comprehensive search for constructing polynomial multiplication over GF(2^n). Using this algorithm, ameliorated architectures with shortened critical path or reduced gates cost will be obtained for the given value of n, where n is in the range of [126, 600] reflecting the key size for current cryptographic applications. With different input constraints the proposed algorithm can also yield subquadratic space multiplier architectures optimized for trade-offs between space and time. Optimized multiplication architectures over NIST recommended fields generated from the proposed algorithm are presented and analyzed in detail. Compared with existing works with subquadratic space complexity, the proposed architectures are highly modular and have improved efficiency on space or time complexity. Finally generalization of the proposed algorithm to be suitable for much larger size of fields discussed
The Natural Ecology and Stock Enhancement of the Edible Jellyfish (Rhopilema esculentum Kishinouye, 1891) in the Liaodong Bay, Bohai Sea, China
Among the edible jellyfish species, Rhopilema esculentum Kishinouye, 1891, is one of the most abundant jellyfish species consumed. Therefore, this jellyfish species is an important fisheries source in China. The jellyfish fisheries in China show annually considerable fluctuations and have a very short season. In the chapter, we firstly try to review the natural ecology of R. esculentum, which includes the distribution and migration, growth model, and survival rate in the Liaodong Bay (LDB) based on the results of our field studies for more than 20 years. Secondly, we focus on reviewing the jellyfish fishery and population dynamic in the LDB. Thirdly, we emphasize the themes, including the survey methods, catch prediction, enhancement assessment, and fishery management, based on our survey results from 2005 to 2010. Finally, we present our field and experiment results of resource restoration. The high commercial value of R. esculentum enhancement in the LDB has made this a very successful enterprise
DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data Capacity of SRAM-based Processing-In-Memory
Processing-in-memory (PIM), as a novel computing paradigm, provides
significant performance benefits from the aspect of effective data movement
reduction. SRAM-based PIM has been demonstrated as one of the most promising
candidates due to its endurance and compatibility. However, the integration
density of SRAM-based PIM is much lower than other non-volatile memory-based
ones, due to its inherent 6T structure for storing a single bit. Within
comparable area constraints, SRAM-based PIM exhibits notably lower capacity.
Thus, aiming to unleash its capacity potential, we propose DDC-PIM, an
efficient algorithm/architecture co-design methodology that effectively doubles
the equivalent data capacity. At the algorithmic level, we propose a
filter-wise complementary correlation (FCC) algorithm to obtain a bitwise
complementary pair. At the architecture level, we exploit the intrinsic
cross-coupled structure of 6T SRAM to store the bitwise complementary pair in
their complementary states (), thereby maximizing the data
capacity of each SRAM cell. The dual-broadcast input structure and
reconfigurable unit support both depthwise and pointwise convolution, adhering
to the requirements of various neural networks. Evaluation results show that
DDC-PIM yields about speedup on MobileNetV2 and on
EfficientNet-B0 with negligible accuracy loss compared with PIM baseline
implementation. Compared with state-of-the-art SRAM-based PIM macros, DDC-PIM
achieves up to and improvement in weight density and
area efficiency, respectively.Comment: 14 pages, to be published in IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems (TCAD
PASNet: Polynomial Architecture Search Framework for Two-party Computation-based Secure Neural Network Deployment
Two-party computation (2PC) is promising to enable privacy-preserving deep
learning (DL). However, the 2PC-based privacy-preserving DL implementation
comes with high comparison protocol overhead from the non-linear operators.
This work presents PASNet, a novel systematic framework that enables low
latency, high energy efficiency & accuracy, and security-guaranteed 2PC-DL by
integrating the hardware latency of the cryptographic building block into the
neural architecture search loss function. We develop a cryptographic hardware
scheduler and the corresponding performance model for Field Programmable Gate
Arrays (FPGA) as a case study. The experimental results demonstrate that our
light-weighted model PASNet-A and heavily-weighted model PASNet-B achieve 63 ms
and 228 ms latency on private inference on ImageNet, which are 147 and 40 times
faster than the SOTA CryptGPU system, and achieve 70.54% & 78.79% accuracy and
more than 1000 times higher energy efficiency.Comment: DAC 2023 accepeted publication, short version was published on AAAI
2023 workshop on DL-Hardware Co-Design for AI Acceleration: RRNet: Towards
ReLU-Reduced Neural Network for Two-party Computation Based Private Inferenc
PolyMPCNet: Towards ReLU-free Neural Architecture Search in Two-party Computation Based Private Inference
The rapid growth and deployment of deep learning (DL) has witnessed emerging
privacy and security concerns. To mitigate these issues, secure multi-party
computation (MPC) has been discussed, to enable the privacy-preserving DL
computation. In practice, they often come at very high computation and
communication overhead, and potentially prohibit their popularity in large
scale systems. Two orthogonal research trends have attracted enormous interests
in addressing the energy efficiency in secure deep learning, i.e., overhead
reduction of MPC comparison protocol, and hardware acceleration. However, they
either achieve a low reduction ratio and suffer from high latency due to
limited computation and communication saving, or are power-hungry as existing
works mainly focus on general computing platforms such as CPUs and GPUs.
In this work, as the first attempt, we develop a systematic framework,
PolyMPCNet, of joint overhead reduction of MPC comparison protocol and hardware
acceleration, by integrating hardware latency of the cryptographic building
block into the DNN loss function to achieve high energy efficiency, accuracy,
and security guarantee. Instead of heuristically checking the model sensitivity
after a DNN is well-trained (through deleting or dropping some non-polynomial
operators), our key design principle is to em enforce exactly what is assumed
in the DNN design -- training a DNN that is both hardware efficient and secure,
while escaping the local minima and saddle points and maintaining high
accuracy. More specifically, we propose a straight through polynomial
activation initialization method for cryptographic hardware friendly trainable
polynomial activation function to replace the expensive 2P-ReLU operator. We
develop a cryptographic hardware scheduler and the corresponding performance
model for Field Programmable Gate Arrays (FPGA) platform
Genome-wide association study of maize resistance to Pythium aristosporum stalk rot
Stalk rot, a severe and widespread soil-borne disease in maize, globally reduces yield and quality. Recent documentation reveals that Pythium aristosporum has emerged as one of the dominant causal agents of maize stalk rot. However, a previous study of maize stalk rot disease resistance mechanisms and breeding had mainly focused on other pathogens, neglecting P. aristosporum. To mitigate crop loss, resistance breeding is the most economical and effective strategy against this disease. This study involved characterizing resistance in 295 inbred lines using the drilling inoculation method and genotyping them via sequencing. By combining with population structure, disease resistance phenotype, and genome-wide association study (GWAS), we identified 39 significant single-nucleotide polymorphisms (SNPs) associated with P. aristosporum stalk rot resistance by utilizing six statistical methods. Bioinformatics analysis of these SNPs revealed 69 potential resistance genes, among which Zm00001d051313 was finally evaluated for its roles in host defense response to P. aristosporum infection. Through virus-induced gene silencing (VIGS) verification and physiological index determination, we found that transient silencing of Zm00001d051313 promoted P. aristosporum infection, indicating a positive regulatory role of this gene in maize’s antifungal defense mechanism. Therefore, these findings will help advance our current understanding of the underlying mechanisms of maize defense to Pythium stalk rot
- …